This is a work in progress - all content and analyses are preliminary.
Economics
Correlation Heatmap
First we have some data wrangle to do. Here, we choose a selection of 35 variables to work with. We also filter for the Connecticut governing regions rather than counties. Finally, we arrange the variables in sensible order so they appear in similar blocks on the correlation plot.
Code
pacman::p_load( dplyr, tidyr, tibble, stringr, purrr, tidyr)conflicted::conflicts_prefer( dplyr::filter(), dplyr::select(),.quiet =TRUE)source('dev/data_pipeline_functions.R')source('dev/filter_fips.R')metrics <-readRDS('data/sm_data.rds')[['metrics']]metadata <-readRDS('data/sm_data.rds')[['metadata']]# Use metadata to get help filter by dimensionecon_meta <- metadata %>%filter(dimension =='economics')# Filter to economics dimensionecon_metrics <- metrics %>%filter(variable_name %in% econ_meta$variable_name)# Filter to latest year and new (post-2024) counties# Also remove NAICS variables to leave us with an approachable number# And pivot wider so it is easier to get correlationsecon_metrics_latest <- econ_metrics %>%filter_fips(scope ='new') %>%get_latest_year() %>%filter(str_detect(variable_name, 'Naics|^lq|^avgEmpLvl', negate =TRUE))# Pivot wider for easier correlations belowecon_metrics_latest <- econ_metrics_latest %>%select(fips, variable_name, value) %>%unique() %>%pivot_wider(names_from ='variable_name',values_from ='value' ) %>%unnest(!fips)# Arrange in some reasonable orderecon_metrics_latest <- econ_metrics_latest %>%select(matches('Code_|metro'),matches('employ|abor|Worker'), # employmentmatches('Sales'),matches('Earn|Income'),everything(),-fips,-matches('expHiredLaborPercOpExp') )
Now we can build an interactive correlation plot. We are calculating correlations based on complete pairwise observations to account for missing data and the changes to Census Bureau reporting for Connecticut.
On Missing Data
This is a reminder to do a deeper dive on missing data once more of the data have been wrangled. There is a substantial amount given the varying scales at which data are collected, not to mention the issues at the county level with Connecticut.
Code
pacman::p_load( dplyr, ggplot2, plotly, reshape, Hmisc, viridisLite)# Make a correlation matrix using all the selected variablescor <- econ_metrics_latest %>%as.matrix() %>%rcorr()# Melt correlation values and rename columnscor_r <-melt(cor$r) %>%setNames(c('var_1', 'var_2', 'value'))# Save p valuescor_p <-melt(cor$P)p.value <- cor_p$value# Make heatmap with custom text aesthetic for tooltipplot <- cor_r %>%ggplot(aes(var_1, var_2, fill = value, text =paste0('Var 1: ', var_1, '\n','Var 2: ', var_2, '\n','Correlation: ', format(round(value, 3), nsmall =3), '\n','P-Value: ', format(round(p.value, 3), nsmall =3)))) +geom_tile() +scale_fill_viridis_c() +theme(axis.text.x =element_text(hjust =1, angle =45)) +labs(x =NULL,y =NULL,fill ='Correlation' )# Convert to interactive plotly figure with text tooltipggplotly( plot, tooltip ='text',width =1000,height =800)
Interactive Correlation Plot
Cladogram
Show Wiltshire framework - what have we covered, what have we added.
Code
pacman::p_load( ggtree, dplyr, ape, data.tree, viridisLite, stringr)## Load data and add an origin leveldat <-readRDS('data/tree_dat.rds') %>%filter(Dimension =='Economics') %>%mutate(Framework ='Sustainability') %>%select(Framework, Dimension:Indicator) %>%mutate(across(everything(), ~str_trim(str_replace_all(., ';|%|/|\\.|\"|,|\\(|\\)', '_')) ))dat$pathString <-paste( dat$Framework, dat$Dimension, dat$Index, dat$Indicator,sep ='/')tree <-as.Node(dat)# Convert the data.tree structure to Newick formattree_newick <-ToNewick(tree)# Read the Newick tree into apephylo_tree <-read.tree(text = tree_newick)# Make all edge lengths 1phylo_tree$edge.length <-rep(1, length(phylo_tree$edge.length))# Add a space to end of node labels so it isn't cut offphylo_tree$node.label <-paste0(phylo_tree$node.label, ' ')# Plot itplot( phylo_tree, type ='c',cex =0.75,edge.width =2,show.tip.label =TRUE,label.offset =0,no.margin =TRUE,tip.color ='black',edge.color =viridis(181),x.lim =c(-0.1, 5))nodelabels( phylo_tree$node.label,cex =0.8,bg ='white')
Cladogram of Sustainability Metrics framework
Metadata Table
Using the table:
Click column headers to sort
Global search at top right, column search in each header
Change page length and page through results at the bottom
Use the download button to download a .csv file of the filtered table
# A tibble: 132,504 × 7
Metric `Variable Name` Definition Year Area Value Units
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Civilian Labor Force civLaborForce Number of civi… 2000 US 1426… coun…
2 Number Employed employed People are cla… 2000 US 1369… coun…
3 Number unemployed unemployed Meet all of th… 2000 US 5697 coun…
4 Unemployment Rate unemploymentRate Number of unem… 2000 US % 4 perc…
5 Civilian Labor Force civLaborForce Number of civi… 2001 US 1437… coun…
6 Number Employed employed People are cla… 2001 US 1369… coun…
7 Number unemployed unemployed Meet all of th… 2001 US 6809 coun…
8 Unemployment Rate unemploymentRate Number of unem… 2001 US % 4.7 perc…
9 Civilian Labor Force civLaborForce Number of civi… 2002 US 1448… coun…
10 Number Employed employed People are cla… 2002 US 1364… coun…
# ℹ 132,494 more rows